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ABSTRACT 

The effectiveness of Stout's procedure for assessing 
latent trait unidimensionality was studied. Strong empirical evidence 
of the utility of the statistical test in a variety of settings is 
provided. The procedure was modified to correct for increased bias, 
and a new algorithm to determine the size of assessment sub-tests was 
used. The following two issues were addressed via a Monte Carlo 
simulation: (1) the ability to approximate the nominal level of 
significance via the observed level of significance; and (2) the 
power of the statistical test while undergoing the modifications. 
Results indicate that Stout's statistic and the procedure, which grew 
directly out of a meaningful conceptual definition of test 
dimensionality, avoids the issue of parametric model correctness, 
attracts the support of asymptotic theory, has modest computational 
requirements, and receives support from Monte Carlo simulations. 
(TJH) 
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Settinq; Standardized test of N items administered to J examinees. Each 
item scored right or wrong: 

[IL = 1] = [i-th item right] 

Each examinee is characterized by unobservable latent ability 8 usually 
assumed continuous and unidimensional random variable. 

Assumptions Underlying the Item Response Theory Models : 

1. Unidimensionality, d=l 

2. p(8) increases in 3, "monptonicity" 

N 

3. p[U = u jfl] = R p[U. = u. for all u and 8 

i=l 

Most Commonly Used Item Characteristic Curve 

1-c. 

P. (8) = p[U. = !|8] - o. ♦ — j^zz? 

Where a = discriminatory power of the item, 

b = location parameter, or difficulty of the item, 
and c = lower asymptote or pseudo guessing parameter of the item 

The Definition of Unidimensionality : 

What is meant by unidimensionality is that only one dominant dimension 
or attribute influences the test performance. The dominant attribute 
results when an attribute is common to many items in a test. Stout 
(Psychometrika, Dec 87) defines unidimensionality in terms of dominant 
dimension as follows: 

A test (U ,U ,...,U ) of length N is said to be essentially 
12 N 

unidimensional if there exists a latent variable 8 such that for all values 

of 0, 

1 I |oovCU U |fl) | SO. 

K } ■ N(N-l) l<i*j<N J 
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This is to say that on the average the conditional covariances should be 
small. Stout furthur defines an empirical notion of unidiroensionality 
consistent with Ecjjation 1.1 as follows: 

A test (U.,U , ...U ) is unidimensional if for all subtests C(U ,U , ...U )> 
1 2 N K _ K_ K 

12 M 

of lenqth M (<N) and all values of Y , 

P 

1 L cov(U 7 U |Y ) = 0 

(1 " 2) S m,n = R(FPiT i<i£j<M k i k j p 

where Y is the proportion correct on the long subtest complementary to 
P 

(U ,U ,..-,U ) with length n = N-M. 
K l K 2 k M 

The above definition suggests splitting the items into two subtests 
such that failure of Equation 1.2 is evidence of lack of unidimensional ity. 

Stout's Test for Assessing Unidimensional ity 

H 0 : d = 1 vs H^d > 1 

The main underlying assumptions 

1. Examinees are randomly selected from a large population. 

2. Examinees respond to items independetley of one another. 

3. Monotonicity. 

4. Local independence (LI): for examinees of same ability, their responses 
to different items are independent. 

Stout's procedure is nonparametric , but 3 PL was used in simulation studies. 



Test Procedure 

Stepl . Split N test items into 3 subtests: two short assessment subtests of 
length li each, and a long partitioning subtest of length n (=N-2M) . 

ATI — Assessment 1 subtest, length M. Choose items into this subtest so that 
they are as unidimensional as possible. This can be done either on the 
basis of expert opinion or using factor analysis as a data analytic tool. 
The purpose of ATI is to compute Stout's statistic. 
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AT2 — Assessment 2 subtest, length M. After selecting items into ATI, items 
into AT2 are selected so that they have the same difficulty distribution as 
those in ATI. The purpose of AT2 is to correct for the pre-asymptotic 
statistical bias in Stout's statistic. 

PT — Partitioning subtest, length n. After selecting items into ATI and AT2, 
the rest of the n = N-2M items are put in PT. The purpose of PT is to group 
examinees into subgroups. When d = 1 and the test is long, each subgroup 
will consist of examinees approximately of equal ability. 

Example 1: d = 1, N = 30, say all verbal items. 

ATI = 5v; AT2 = 5v; PT = 20V 

Example 2: d = 2, N = 30, say 10 math (M), 10 verbal (V), and 
10 mixed (X) items. 

ATI = 5v; AT2 = IV, 2M, 2X; PT = 4V, 8M, 8X. 



Step 2 : Assign examinees into different subgroups according to their scores 
on PT. 



Step 3 : Within each examinee subgroup, estimate examinee variation on ATI 
in two ways: 

Let U. denote the response of the j-th examinee to the i-th item from 
ljk 

subgroup k. Let J denote the number of examinees in subgroup k and let K 

K 

denote the total number of subgroups. 

(k) J1 

is the usual variance estimate for subgroup k, where Y. = £. 4 U. /M, 

j 1=1 ljk 

and Y (k) = Y^AJ, and 

J-l J K 

~ 2 M ^(k),, ~(k), . 2 

a u,k = *i=i p i (1_p i )/M ' 
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is the unidimensional variance for subgroup k, where = T k U /J . 

1 j=l ljk k 

Note : Under H Q , if the test is long, the long partitioning test ensures 

that examinees within each subgroup are approximately equal ability and the 

assumption of local independence will be closely met leading to a = a 

k U,k 

UnJer H^, however, the assumption of local independence will be badly 

"2 "2 
violated leading to or, »<?,,,- 

k U,k 



Step4: Compute the statistic T (L for long test) for items in ATI subtest: 
L 



(1.3) 



T = -J— L 

l $r k=i 



/ **2 

o - a 

k U,k 



where S is the appropriate normalizing constant given by: 
k 



S. = 



/J. 



where 

A , (k") -(kV4, 
X = /> „< k >n-o< k \ (i-2o (k) ) 2 , 

4,k 1=1 ii i 
Step 5: Repeat steps 3 and 4 for items in AT2 and compute the statistic T 



B 



(B for bias correction) according to the Equation 1.3. 



Step 6: Perform the test for unidimensionality. 



Stout's test for unidimensionality is given by 



(1.4) 



T = Ct -T B )AS. 



Reject H : d = 1 if T > 2 , where Z is the upper lOO(l-a) percentile for 
0 oc a 



/ 



items in this case, subgroups tended to have examinees varying highly in 
the ability being tested. This lead to badly violating the assunption of 
local independence within subgroups. Hence the procedure performed badly in 
the d=l case. 

Correction for Increased Bias 

It was observed that the subgroups of examinees would be more 
desirable if they were placed into subgroups on the basis of their scores 
on items that are not all difficult. This can be achived in the following 
way. 

1. After selecting items into ATI, they are checked statistically to see if 
they are too easy as a group. 

2. If so, they are replaced with items of highest loadings of opposite sign 
so that they are still as unidimensional as possible when d=l. 

3. Otherwise, items are retained. 

An Algorithm for Determining the size of Assessment Subtests 

Prior to developing this algorithm, tire size of the assessment 
subtests (M) was specified by the user prior to applying Stout's procedure 
for assessing dimensionality. 

The proposed algorithm mechanically determines the size of assessment 
subtests ATI and AT2 based on the magnitude of item loadings on the second 
factor. 

Monte Carlo Simulation Studies 

A large scale simulation study was conducted in both d=l and d=2 
cases. The purpose was to establish that Stout's procedure after undergoing 
correction for increased bias, and using the new algorithm to determine the 
size of ATI and AT2, provides strong empirical evidence of the utility of 
the statistical test in a variety of test settings. More precicely two 
issues were addressed: 

(a) how well the nominal level of significance specified by the user (a = 
.05) is approximated by the observed level of significance, and 

(b) how well the power of the statistical test is maintained while 
undergoing the above-mentioned changes. 
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the standard normal distribution, a being the desired level of 
significance. 

Note : In all the simulation studies, a part of the sample is used to 
perform factor analysis in order to place items into subtests ATI, AT2, and 
PT in Step 1, and the rest of the sample is used to compute the statistic 
in steps 2 through 6. 

A Limitation of Stout's Procedure 

The items of SAT-verbal test (confirmee as unidimensional) were divided 
into two groups. One group with items having a-parameter greater than 1.0 
and the other group witn a-parameter less than 1.0. Stout's procedure was 
applied to assess dimensionality of each subgroup as if it was a test in 
itseK. The results were markedly different. 



Table 1 

Rejection Rates per 100 Trials for d = 1 5 PL Simulation Study Using 
Estimated Item Parameters of SAT Verbal Test With o: - 0.05 



Discrimination 


Number of 


Mjmber of 


examinees 


parameter 


items 


750 


1000 2000 


low a's (0 < a. < 1.0) 
" l ~ 


41 


4 


0 3 


high a's (1.1 < a. < 2.0) 
l 


39 


28 


46 58 



Reason for Increased bias 

In Monte Carlo simulations studies, factor analysis was used as a data 
analytic tool to select items into ATI. Using principal axis factor 
analysis, items with high loadings of the same sign (either positive or 
negative) on the second extracted factor are selected into ATI. In the case 
of high a's, most often, very easy items tended to have highest loadings in 
magnitude. Consequently, the easiest items were put into ATI. Stout's 
procedure then, in an attempt to control for statistical bias due to short 
test lengths, puts the remaining easy items into AT2. Therefore, PT was 
left with difficult items remaining. Because examinees are grouped 
according to their score on PT, consisting of difficult or very difficult 
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The following results illustrate that using the proposed bias 
correction method together with the new algorithm to determine the size of 
assessment subtests has completely eliminated the excess bias due to high 
a's in Stout's statistic and improved the performance of Stout's procedure. 

Table 2 

Rejection Rates per 100 Trials for d = 1 5 PL Simulation Study Using 
Estimated Item Parameters of Respective Tests With o: = 0.05 



TESTS 



J 


SAT V 


SAT V 
high a'S 


ACT M 


ACT E 


ASVAB 


AS ASVAB AR 


ASVAB GS 




SO 


39 


40 


SO 


25 


30 


25 


750 


0 


0 


1 


1 


1 


0 


2 


2000 


0 


0 


1 


2 


1 


0 


13 



Notes : Numbers in bold face represent the number of items used in the 
simulation study of the respective test. 

J denotes the number of examinees simulated . 

SAT V denotes the Scholastic Aptitude Test for verbal. 

SAT V high a's denotes the Scholastic Aptitude Test where items have high 
discrimination parameter, namely 1.1 < a^ < 2.0. 

ACT M denotes the ftnerican College Test for mathematics usage. 

ACT E denotes the American College Test for english usage. 

ASVAB AS denotes the Armed Services Vocational Aptitude Battery for Auto 
Shop information. 

ASVAB AR denotes the Armed Services Vocational Aptitude Battery for 
Arithmetic Reasoning. 

ASVAB GS denotes the Armed Services Vocational Aptitude Battery for 
General Science. 
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Table 3 

Rejection Rates per ICQ Trials for d = 2, c = 0.2 5PL Simulation Study 
With a =0.05 



TESTS* 



Parameters 


SAT V 


ACT M 


ACT E 


ASVAB AS 


ASVAB AR 


ASVAB GS 




SO 


40 


SO 


25 


30 


25 




17:17:16 


13:13:14 


17:17:16 


8:8:9 


10:10:10 


8:8:9 


J 


750 2000 


750 2000 


750 2000 


750 2000 


750 2000 


750 2000 


p = 0.5 


94 100 


87 98 


68 91 


65 97 


84 ^ 


68 100 


p = 0.7 


36 69 


44 77 


19 31 


35 70 


43 74 


31 74 



Notes : Numbers in bold face rc oresent the number of items used in the 
simulation study of the respective test. 

* the two artificial dimensions have item parameter distribution as that 
of the respective real test. 

N^ denotes the number of pure items of ability 1. 

denotes the number of pure items of ability 2. 

N denotes the number of mixed items requiring the knowledge of both 
ability 1 and ability 2. 

p denotes the correlation between the abilities. 
J denotes the number of examinees. 
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Advantages of Stout's Procedure. 

1. Unlike most other procedures, Stout's statistic and the procedure 
grew directly out of a meaningful conceptual definition of test ' 
dimensionality. That is, Stout's statistic T is designed to be sensitive 
only to dominant dimensions and not sensitive to item idiosyncracies. 

2. The procedure is supported by an asymptotic theory . 

3. It is nonparametric, thus avoiding the issues of parairetric model 
correctness. 

4. A major advantage of the procedure from the practitioner's point 
of view is that the computational requirements are modest and hence cost 
effective. For example, it takes 7 seconds to assses the unidimensionality 
of a 30 item test and 20 seconds for a 50 item test on CYBER 175. 

5. Lastly, extensive Monte Carlo simulations for a wide variety of 
test lengths, and sample sizes, as also can be seen in 

Stout (Psychometrika, Dec 87) strongly support the validity of the 
procedure. 



1 1 



